home *** CD-ROM | disk | FTP | other *** search
-
-
- Logging user access is something which we definitely want ..for a
- number of reasons
-
- - Justifying the project by showing statistics
- - Demonstrating the readership profiles of
-
- different material
- - Demonstrating the usage profile across sites
-
- The privacy issue is very important, and so I had intended to
- log each action "A read B" as "A read something" and "B was read"
- independently. This would give the basic profiles. Anything futher
- would be an infringement of privacy, so yes that the user would
- have to agree to it. The problem is, then the sociological data would
- be immediatly filtered ... all the alt.sex.bondage readers would
- filter themselves out! Perhaps two levels are needed.
-
- The network load is also something which I considered a possible
- problem, so I decided on a scheme (have I said this before?) in which
- an event was logged with probability p=exp(-a*t) and the probability
- p is included in the message so that the message can be given weight
- 1/p in the analysis. The time t with which p decays is from
- compilation of the source, so you get more fine-grained
- info on the new releases.
- The messages would be UDP packets so as not to clog gateways.
-
- We have a monitoring service here which is already monitoring the use
- of other CERN software -- I am not sure whether it is tcp or udp
- based.
-
-
- *Coincidence:* As I write the file system on our server has JUST
- filled up in attempting to process server January's log data....
- is this a warning?!
-
- BTW: Marc, you were going to log how LONG an article was read for.
- I think that is very tricky... if you can come up with a good measure
- of how much the person LIKED the article (automatically) then you
- will really have something. Someone whose name I forget in Stockholm
- just gave a talk about inferrding document affinities from readership
- profiles... using the user as a more refined text comparison program
- than a work occurence engine. I suggested WWW usage data as source,
- but realized that for example of all the talk I had just given with
- XMosaic, the document which was left on the screen for the longest
- time was quite irrelevant.
-
- Something linked with this is finding relevant material for
- a particular person. How about a service which takes someone's
- global history file and tells them all that's new in the world
- which would interest them? In other words, if you do keep
- data about a particular person, then that can help them find more
- data like it.... a sophisticated form of relevance feedback.
-
-
- - - -
-
-
- I think that as you are collecting data from the public, then the
- data should also be made available to the public, with names and
- addresses removed.
-
- Another possibility is that all servers keep logs and share the
- results... but it will always be incomplete.
-
-
- Tim
-
-